Centrality scaling in large networks 
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Betweenness centrality lies at the core of both transport and structural vulnerability properties 
of complex networks, however, it is computationally costly, and its measurement for networks with 
millions of nodes is nearly impossible. By introducing a multiscale decomposition of shortest paths, 
we show that the contributions to betweenness coming from geodesies not longer than L obey a 
characteristic scaling vs L, which can be used to predict the distribution of the full centralities. The 
method is also illustrated on a real-world social network of 5.5 x 10^ nodes and 2.7 x 10^ links. 
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Many complex networks are organically evolving with- 
out any centralized control or design, and for this reason 
intense research has been devoted to understand their 
performance properties and more importantly, their vul- 
nerabilities and failure modes. In these studies, a fun- 
damental role is played by centrality measures (origi- 
nally introduced in social sciences [IHS]), and in partic- 
ular betweenness centrality [MS]- Betweenness central- 
ity (BC) of a node (edge) is defined as the fraction of 
all geodesies (shortest paths) passing through that node 
(edge). Since transport tends to minimize the cost/time 
of the route from source to destination, geodesies, and 
hence centrality measures and their distributions will 
strongly determine overall transport performance. In- 
terestingly, geodesies are not only important for network 
flows but also for structural connectivity: removing nodes 
(edges) with high centrality one obtains a rapid increase 
in diameter, and eventually the structural breakup of the 
graph. Analysis of traffic, or information flow [7| [9HT4], 
network vulnerability in face of attacks [15], cascading 
failures [161 Hi] oi" epidemics [18] , all involve betweenness 
calculations. 

Unfortunately, computation of betweenness is very 
costly [131 El mHH] and for large networks with millions 
to billions of nodes it is near impossible, hence approx- 
imation methods are needed. Existing approximations 
[23l|24], however, are sampling based, and ill controlled. 

Here we show that when geodesies are restricted to 
a maximum length L, the corresponding range-limited 
L-betweenness (introduced by Borgatti and Everett as 
bounded-distance betweenness [5 ) for large graphs as- 
sumes a characteristic scaling form as function of L. 
This scaling can then be used to predict the between- 
ness distribution in the (usually unattainable) diame- 
ter limit, and with good approximation, to predict the 
ranking of nodes/edges by betweenness. Additionally, 
the range-limited method generates /-betweenness values 
for all nodes and edges and for all 1 < I < provid- 
ing systematic information on geodesies on all length- 
scales. This is of interest in its own right, when the 
transported entity has a small transmission probability 



(rumors, viruses) and thus high attrition rate, not ex- 
ploring longer geodesies. As we show, the L-betweenness 
scaling is already achieved for relatively small L values 
and there is increasingly less new information obtained 
on BC distribution and ranking when going from L to 
L + 1. The computational overhead, however, involved in 
the L L-\-l step is usually immense. The range-limited 
centrality algorithm presented here, even in the diame- 
ter limit (L = D), has no larger complexity than the 
currently known fastest algorithms by Brandes [19] and 
Newman [20 , that is 0(7VM), where N is the number of 
nodes and M is the number of (directed) edges, and it is 
fully parallelizable. For L < D our algorithm runs sub- 
linearly in 0{NM)^ making it possible to study networks 
with millions of nodes. As an illustration, we analyzed 
a social network (SocNet) inferred from mobile phone 
trace- logs [25] having a giant cluster with A/" = 5, 568, 785 
and M = 26, 822, 764. For this network we calculated 
all L-betweenness centralities (L-BCs) for all nodes and 
edges up to L = 5 in 6 days, on 10 processors. With 
increasing L the ranking of the highest BC nodes freezes 
and one can predict the top nodes early. The number 
of geodesies running through these nodes, however, ex- 
plodes with L. For example, while the node with highest 
centrality for L = 4 has 40, 084, 702 geodesies, for L = 5 
it has 500, 903, 498 of them passing through. 

Calculating betweenness centrality of a node or edge 
in a directed graph G{V^ E) requires to count the num- 
ber of all-pair shortest directed paths incident on it. 
Here we include end-points, however, the algorithm 
can easily be changed to exclude them, or produce 
other variants. The stress centrality (SC) S{i) of a 
node i G F is simply the sum of the total number 
o'mn{i) of shortest directed paths from node m to n 
going through i, S{i) = T^rn^nev ^rnnii)- Between- 
ness centrality (BC) [H [8] normalizes the number of 
paths through a node by the total number of paths 
(o'mn) for a given source-destination pair {m^n): B{i) = 
nev /o'mn- Similar quantities can be defined 

for an edge (j, /c) G E: S{j,k) = Y^m^nev ^^n{j,k) and 
ij,k)/a 
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FIG. 1: a) Shells of the C3 subgraph of node i (black) are 
colored red, blue, green. Grey elements are not part of 
the subgraph, b) Eq. ([T]) calculates SC of a node in Gi 
(blue) by summing the SC of all its predecessors from Gi-i(i) 
(red), e.g., s\{i\j) = s\zl{i\k) + s\zl{i\m). c) Egs.^,)! 
are based on the observations: ain{j,k) = Sr{i\j)crkn ana 
crin(k) — sl,'^\(i\k)akn- Eq.|4| calculates the fixed-/ central- 
ities for a node (red) in Grjt) by summing the correspond- 
ing centralities of its outgoing links (blue) in Gr-\-i{i), e.g., 
sKAj) = s^,^\z\j,k) + s^,^\z\j,m). 

In order to define range-limited quantities, let si{j) and 
bi{j) denote the stress and betweenness centralities of a 
node j for all-pair shortest directed paths of fixed length 
I. Then SlU) = ELi ^lU) and BUj) = Ef=i kU) rep- 
resent centralities from paths not longer than L. Similar 
measures for an edge are defined in the same way. Just as 
virtually all centrality algorithms, our method calculates 
these quantities for a node j for shortest directed paths 
all emanating from a "root" node z, then it sums the ob- 
tained values for all i G V to get the final centralities for 
j (similarly for edges). While the basic concept of our 
algorithm is similar to Brandes' [19 and Newman's [20] , 
we derive recursions that simultaneously compute both 
SC and BC for all nodes and edges and for all values 
/ = 1, . . . , L. The algorithm's output thus generates de- 
tailed and systematic information about shortest paths 
in a graph on all length-scales, providing a tool for mul- 
tiscale network analysis. 

The algorithm starts from a given root i and builds 
the L-range subgraph Cl containing all nodes which can 
be reached in at most L steps from i. Only links which 
are part of the shortest paths starting from the root are 
included in C^. We decompose Cl into shells Gi{i) con- 
taining all the nodes at shortest path distance / from the 
root, and all incoming edges from shell / — 1, Fig. [l^). 
The root itself is considered to be shell (Go(^)). 

Let sl{i\j) = Xl^^c'z ^in{j) denote the number of 
shortest directed paths of length / from the root through 
node j in the r-th shell j G Gr{i)^ and let s]^{i\j^k) = 
^neGi ^in{j^k) describe the same quantity for an edge 
(j, /c) in the r-th shell, {jjk) G Gr{i)- We de- 



fine similar quantities for betweenness, as h\{i\j) = 
T^neGi (^in{j)/(^in^ and hl{i\j,k) = T^ned ^in{hk) / (^iw 
Then si{j) = EiGy4(^b') and bi{j) = EiGy^F(^b'). 
with similar equations for edges. In these sums r is not 
an independent variable. Given i and j, it is the radius 
of shell Gr{i) centered on i and containing j. One can 
show that the following recursions hold, (see also Fig.jl]): 

s\m = Ek4zim,b\{i\j) = i, (1) 

sl+\i\j, k) = s\+\i\k)sl{i\j)/slX\m, (2) 

b^+\i\j,k) = b^+\i\k)s;{i\j)/s:xim, (3) 

The steps below are repeated for / = 1, . . . , L: 1) Build 
Gi{i)^ using breadth- first search. 2) Calculate the l- 
centrality measures (sj(z|j), b\{i\j)) of all nodes in Gi{i). 
3) Moving backwards, through r = / — 1, 1, 0, calculate 
the fixed-/ centralities of links in Gr+iii) and of nodes in 
Gr{i)^ using recursions ([l][4|. Finally, return to step 1) 
until the last shell Gl(0 is reached. In the end, we ob- 
tained the fixed-/ betweenness values of all nodes and 
edges in Cl- This concludes the basic algorithm, which 
can be modified to compute different variants of BC and 
SC, such as excluding endpoints. Similar recursions can 
also be derived for load and closeness centrality [71 [21] . 

The L-betweenness values on large networks obey a 
scaling behavior as function of L. On Fig. [2] we plot 
the distribution of node betweenness values measured on 
the Erdos-Renyi (ER) random graph |26j, the Barabasi- 
Albert (B A) scale-free model [27] , the random geometric 
graph (KG) [28] and the large social network (SocNet) 
[25 . Since in large networks Bl grows quickly, it is better 
to work with the distribution Ql of the In Bl values than 
with the distribution Pl of Bl values. However, note 
that Ql{I^B) = BPl{B). As shown on the insets of 
Fig. [2j the distributions Ql{^t^B) for different L can 
be rescaled onto each other by plotting Q = (JlQl vs 
u = [ln(5) — (jllI/^l-, where jiL and cfl are the mean 
and the standard deviation for IhBl- These networks 
were chosen to represent very different graph classes: the 
ER, BA and SocNet have small diameters, while the RG 
has no shortcuts. The RG is spatially embedded {d = 2) 
unlike ER and BA; the SocNet, however, is influenced 
by the spatial embedding of people's motility [25 . While 
BA has a power-law degree distribution P{k) ~ 
both ER and RG have a Poissonian for P(/c), and the 
SocNet's P{k) resembles a log-normal [3Tl[3l]. Both RG 
and SocNet have high clustering, unlike the others. 

Next we show that the scaling behavior observed for 
range-limited centralities in large graphs is a consequence 
of the scaling for shell sizes shown to exist for e.g., 
in random graphs with arbitrary degree distributions 
[29} [30] , Here we present arguments for undirected, 
uncorrelated graphs and only deal with BC, extensions 
to other centralities mentioned above being straightfor- 
ward. Let us define (•) as an average over all root nodes 
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FIG. 2: Distribution Ql of L-betweeness for different values 
of L. a) ER, iV = 5 X 10^, (k) = 4, diameter D = 16, b) BA, 
AT = 5 X 10^, m = 3, D = c) SocNet, = 5, 568, 785, M 
26, 822, 764 and the distributions are fitted by a lognormal 
(black dashed curves), d) RG, N = 10^, (k) = 15, D = 79. 
The insets show the rescaled distributions, see text. 

i in the graph. If zi{i) denotes the number of nodes 
on shell Gi(i)^ then we model the growth of shell sizes 
by a branching-like process zij^i{i) = zi{i)ai\l + 
where ai = {zi-^i)/{zi) is the branching factor at an l- 
th shell, and ei{i) is a per-node, shell occupancy noise 
term, |e/| <C 1, considered to obey = and 

{^i{i)^m{i)) = '^^i^i^mi with Ai decreasing with /, sup- 
ported by numerical evidence. For undirected paths we 
can write = (1/2) Ei^y = + 



(1/2) EL=i E.eG.o-) K^im ^ ^wij) + (l/2)^^+i(j), 

where we used the fact that in undirected graphs i G 
Gm{j) <^ j G Gm{i)' Note that the number of terms 
in the inner sum EiGG^(j) ^l+i(^b') Zm{j), which 
is rapidly increasing with m, and thus ui-^i{j) is ex- 
pected to have a weak dependence on j. Accordingly, 
we may approximate ui^i{j) YlL=i YlieGrr^U) 
where v]^-^{i) is an average betweenness computed on 
a shell of radius m, centered on node i : vl^-^{i) = 



[E 



keGmii) 



b]-^,{i\k)]/z^{i). 



tion that 
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Based on the observa- 
zi{i)^ we can write that 



vl^-^{i) zij^iii) / Zm{i) • Using the recursion defined 
above for ^z+i(i) as a branching process, and neglect- 
ing the small noise term, we obtain that ui-^i{j) 

Y!m=i T^ieGrr^U) zi{i)/zm{i)- This allows us to write 
a recursion for 6^+i(j) as 6^+i(j) ai[bi{j) + zi{j)/2 + 
zi{j)ei{j)]^ which can be iterated down to / = 1, where 
^i(j) = Zi{j) = kj is the degree of j: 



bi{j) ^ (3ikje^^^^^ 



(5) 
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Cnij)' Eq ibl allows to relate the statistics of 





le+6 


-(d) 


RG 












• L=l 

L=2 
L=3 

L=5 \ 


le+2 






• L=30 
L=40 

• L=50 


L=7 ■ 
L=10 ■ 
L=20 




100 150 200 

L 



FIG. 3: a) 6; (circles) and Bi (stars) vs. / for some node j 
in SocNet (red) and ER (blue), b) same as a) for RG for two 
arbitrary nodes i and j. Bl+i vs. El for c) SocNet and d) 
RG. Each dot corresponds to a node. Ranking by BC vs L 
for the top 10 nodes in e) SocNet and f) RG (from Fig [I]). 

fixed-/ betweenness to the statistics of shell occupancies. 
Since the noise term (calculated from per-node occupancy 
deviations on a shell) is independent on root degree, the 
distribution of fixed-/ betweenness can be expressed as: 



1 r^-i 

- / dkP{k)^i{\nh 



\n(3i-\nk) , (6) 



where P{k) is the degree distribution and $z(0 is the 
distribution for the noise ^z(j), peaked at ^ = 0, with fast 
decaying tails and ^i{x) = 5{x). From ([6| follows that 
the natural scaling variable for betweenness distribution 
is u = \nb — \n I3i. An extra /-dependence comes from the 
noise through the width ai of (for / > 1), which can be 
easily accounted for by the rescaling u u/ai^ pi picJi^ 
collapsing the distributions for different /-values onto the 
same functional form. As ^i is sharply peaked around 0, 
the most significant contribution to the integral (|6| for 
a given h comes from degrees k o:^ b//3i. Since /c > 1, 
we have a rapid decay of pi{b) in the range 6 < a 
maximum ai b = Pik where k is the degree at which 
P{k) is maximum, and a sharp decay for b > {N — l)Pi. 
In many networks, shell-size grows exponentially (ER, 
AB, and also in the SocNet), that is ai a = {z2)/{k), 
until / reaches the average shortest path distance. This 
implies that f3i ~ and bi grows exponentially with / 
(Fig. [S^i). In this case, since bi is rapidly increasing with 
/, the cumulative Bl^j) = bi{j) will be dominated by 
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serving that finite size effects appear when the sum of av- 
erage shell sizes hits N: X]^=i(^^) — Xl^=i j^Pii^) — ^• 
This allows to find L* from the scaling behavior of Pi . In 
particular, for the SocNet L* = 10. 

In summary, we have shown that the contributions to 
centrality measures coming from different length scales 
of the geodesies exhibit characteristic scaling in large 
graphs. Exploiting this universal property with the 
methods presented here makes it possible to predict be- 
tweenness values, distributions and ranking with rela- 
tively low computational costs. 
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0826958, HDTRA 201473-35045 and by the Army Re- 
search Laboratory, W911NF-09-2-0053. Views and con- 
clusions are those of the authors, not representing those 
of the ARL or U.S. Govt. 



FIG. 4: Vulnerability backbone in a RG graph (AT = 5 x 10^, 
(k) 5) for a) L = 5, b) L = 15, c) L = 45, d) L = i:> = 195. 
Darker red indicates nodes with higher Bl- In agreement 
with Fig|3j, VB is already well approximated at L = 45, c). 

the largest / values and thus, obeys a similar scaling 
supporting the observations in Fig. [2j For pure scale-free 
networks P{k) = ck~^ ^ and piih) cx {h/j3iY~^ for / > 1. 
In networks where the shell size grows as a power law 
(spatially embedded networks without shortcuts), such 
as RG, roadways, etc., f3i ^ l^^ where d is the embedding 
dimension, bi{j) ~ and Bl ~ (Figjsjb). 

As the contributions of the noise terms e/(j) to 
coming from larger shells are decreasing with increasing 
/ (their weight decreases as (/ + 1)~^ in addition to the 
decreasing of their magnitude \ei{j)\) the quantities 
rapidly converge to a constant. From ([5|, for a pair of 
nodes ij: \n[bi{i) /bi{j)] = \n{ki/kj) ^ - show- 
ing that their relative ranking by /-betweenness freezes 
with increasing Consequently, Bl and become 
more correlated with increasing L (Fi^3]3,d) and the rank- 
ing of the nodes by their EC also freezes (Figj3^,f), al- 
lowing early prediction of top betweenness nodes. Spa- 
tially embedded networks (RG) without shortcuts rep- 
resent the worst case, but relative to their diameter the 
convergence of ranking is still fast (FigjsjF). An important 
application of top betweenness predictability is determin- 
ing the "vulnerability backbone" (VB) of a graph (crucial 
for network defense purposes |T5l[T8]) which is made by 
the smallest fraction of highest betweenness nodes form- 
ing a percolating cluster through the network. Figj4]for 
RG (worst case) shows that the VB (red subgraph) can 
accurately be predicted already from L = 45 between- 
ness values (Fig|4]3) compared to the diameter {D = 195) 
based full betwennesses (Figj4]i). 

Finally, we note that the scaling behavior can be used 
to provide a lower bound L* of the diameter, from ob- 



Electronic address: mercseyr@ nd.edu| 
^ Electronic address: toro@nd.edu' 
[1] S. Wasserman and K. Faust, Social Network Analysis: 

methods and applications (Cambridge Univ. Press, 1994). 
[2] J. Scott, Social Network Analysis: A Handbook (Sage 

Publications, 1991). 
[3] G. Sabidussi, Psychometrika 31, 581 (1966). 
[4] N. E. Friedkin, Amer. J. of Soc. 96, 1478 (1991). 
[5] S. P. Borgatti and M. G. Everett, Soc. Netw. 28, 466 

(2006). 

[6] L. C. Freeman, Sociometry 40, 35 (1977). 

[7] S. P. Borgatti, Soc. Netw. 27, 55 (2005). 

[8] J. M. Anthonisse, Tech. Rep. BN 9/71, Stichting Math. 
Centr., Amsterdam (1971). 

[9] S. Sreenivasan et al, Phys. Rev. E 75, 036105 (2007). 
[10] L. Dall'Asta et a/., Theor. Comp. Sci. 355, 6 (2006). 
[11] L. Dall'Asta et a/., Phys. Rev. E 71, 036135 (2005). 
[12] K.-I. Goh et al, Phys. Rev. Lett. 87, 278701 (2001). 
[13] B. Danila et a/., Phys. Rev. E 74, 046114 (2006). 
[14] R. Guimera et a/., Phys. Rev. Lett. 89, 248701 (2001). 
[15] P. Holme et a/., Phys. Rev. E 65, 056109 (2002). 
[16] A.E. Hotter, Phys. Rev. Lett. 93, 098701 (2004). 
[17] A. Vespignani, Science 325, 425 (2009). 
[18] L. Dall'Asta et a/., J.Stat.Mech., P04006, (2006). 
[19] U. Brandes, J. of Math. Sociology 25, 163 (2001). 
[20] M. E. J. Newman, Phys. Rev. E 64, 016132 (2001). 
[21] U. Brandes, Soc. Netw. 30, 136 (2008). 
[22] J.D. Noh and H. Rieger, Phys. Rev. Lett. 92, 118701 
(2004). 

[23] U. Brandes and C. Pich, L J. Bif. Chaos 17, 2303 (2007). 

[24] R. Geisberger et a/., in ALENEX , 90 (2008). 

[25] M. C. Gonzalez et al, Nature 453, 779 (2008). 

[26] P. Erdos and A. Renyi, Publ. Math. Inst. Hung. Acad. 

Sci 5, 17 (1960). 
[27] A. L. Barabasi and R. Albert, Science 286, 509 (1999). 
[28] J. Dall and M. Christensen, Phys. Rev. E 66, 016121 

(2002). 

[29] M. E. J. Newman et al, Phys. Rev. E 64, 026118 (2001). 
[30] J. Shao et a/., Phys. Rev. E 80, 036105 (2009). 
[31] J. P. Onnela et a/., PNAS, 104, 7332 (2007). 
[32] M. Seshadri et a/., SIGKDD-08 (2008). 



